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ABSTRACT 

An  approximation  is  given  to  calculate  V,  the  covariance 
matrix  for  normal  order  statistics.  The  approximation  gives  con- 
siderable improvement  over  previous  approximations,  and  the  com- 
puting algorithm  is  available  from  the  authors. 


1.  INTRODUCTION 

Many  statistical  methods  involve  order  statistics,  and  for  a 
proper  study  of  these  methods  the  covariance  matrix  V of  a sample 
of  order  statistics  is  needed.  For  a few  important  distributions 
(e.g.,  the  uniform  and  exponential),  the  entries  can  be 

expressed  in  closed  form  and  can  be  calculated  easily;  but  for 
most  parent  populations  each  V„  involves  a double  integral,  so 
that  accurate  tabulation  is  difficult  and  expensive.  In 
particular,  for  the  normal  population,  V has  so  far  been  pub- 
lished only  for  samples  of  size  n ^ 20,  (see  e.g.,  Sarhan  and 
Greenberg,  1956;  Owen,  1962).  The  need  for  good  tables  of  V,  for 
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many  populations,  was  pointed  out  by  Hastings  et  al,  (1947)  and 

the  magnitude  of  the  problem  of  exact  calculation  was  also 

stressed;  subsequently,  series  expansions  for  V,,  have  been  given 

by  Plackett  (1958)  and  by  David  and  Johnson  (1954) . Saw  (1960) 

compared  these  expansions  and  concluded  that  although  Plackett 's 

series  converges  a little  faster  for  a normal  population,  there 

were  computational  advantages  in  the  David-Johnson  method.  The 

-3 

David-Johnson  formulae  give  up  to  terms  in  (n+2) 

In  this  paper  we  are  concerned  with  V for  normal  order 
statistics.  Xie  give  a technique  by  which  one  can  obtain  an 
excellent  approximation  for  V,  by  starting  with  the  values  given 
by  the  David-Johnson  formulae  and  modifying  them  by  use  of  cer- 
tain identities  and  specially  tabulated  values  for  normal  order 
statistics . 

Suppose  ^(2)’  ***’  ^(n)  order  statistics  (in 

ascending  order)  of  a sample  of  size  n from  a normal  distribution 
with  mean  0 and  variance  1;  let  m^  = E(X^^),  where  E stands  for 
expectation,  and  let  V have  entries  = E(X^^  -m^)(X^j^ 

Three  useful  identities  are: 


For  any  i,  E V..  = 1 ; 

i=l 


(1) 


E(X(i)  ) = E(X^^^X^2)>  + 1 » 


(2) 


n 

Tile  trace  of  V is  tr(V)  = n - E m/ 

i=l  ^ 


(3) 


From  (2)  we  obtain 


ha  - ’=«(!)’'  - 


”l"2  - 1 


(« 


Of  these,  (1)  is  very  well  known,  (2)  is  given  in*  e.g. , 
GovindaraJ ulu  (1963),  and  (3)  is  easily  proved  as  follows; 
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tr(V) 


Ee{X/..  - m.}^ 

(i)  1 


rE{X(i))- 


We  shall  also  need  the  results 


V.  . = V.  . = V = V ; 
13  31  rs  sr 


, obtained  from  the  symmetry  of  V: 
r = n + l-i,  s = n + l-j  . (5) 


Values  of  m^  have  been  extensively  tabulated;  e.g. , for 

n ;<  20,  to  10  decimal  places  (d.p.)  in  Teichroew  (1956),  and,  for 

all  n _<  100  and  at  intervals  for  n j<  400,  to  5 d.p.  in  Harter 
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(1961).  The  sum  Em^  is  given  for  n ^ 100,  to  5 d.p.  in  Pearson 
and  Hartley  (1972,  Table  13)  and  in  Owen  (1962,  p.  154).  Ruben 
(1954)  examines  the  distributions  of  m^  from  a geometric 
vie^'Tpoint;  among  the  results  in  his  paper  he  gives  moments  of  the 
extreme  order  statistic  and  tabulates  the  variance  of  i.e., 

for  n 50,  to  8 d.p.  Borenius  (1966)  has  extended  this  tab- 
ulation to  n £ 120.  These  exact  values  are  important  in  obtain- 
ing a good  approximation  for  V,  since  is  the  most  inaccurate 
term  in  David  and  Johnson's  formulae.  LaBrecque  (1973)  used  the 
David  and  Johnson  technique,  and  the  correct  to  calculate 

certain  functions  of  m'  = (m, , m„,  ...,  m ) and  V.  In  the  next 

i iC  n 

section  we  use  and  the  other  identities  above  to  give  a con- 
siderable improvement  over  the  David-Johnson  formulae  used  alone. 
By  normalization  of  a row  we  shall  mean  keeping  certain  terms 
fixed  and  then  multiplying  the  others  by  a constant,  to  ensure 
that  the  sum  of  all  elements  is  1,  as  required  by  (1)  above. 

2.  AN  APPROXIMATION  FOR  V 

The  calculations  for  V follow  the  following  steps: 

(a)  Insert  the  correct  from  the  tables  referenced  above; 

(b)  Insert  the  correct  V^2  ftom  (4) ; 

(c)  Insert  the  rest  of  row  1 using  the  David  and  Johnson 
formulae; 
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(d)  Keep  and  V^2  fixed,  and  normalize  row  1.  When  row  1 has 
been  calculated,  fill  in  column  1,  row  n and  column  n from  the 
symmetry  relations  (5) . 

(e)  Apart  from  terms  already  calculated  from  steps  (a)  through  (d) 

(i.e.,  ^ ^2n^  ’ ^ from  the  David  and  Johnson 

formulae,  and  normalize  row  2.  Fill  in  column  2 and  row  n-1 

and  column  n-1  from  the  symmetry  of  V.  Continue  with  succes- 
sive rows  until  all  rows  are  normalized. 

These  operations  make  the  top  left  comer  of  correct, 

and  the  rows  more  accurate  than  before;  but  the  trace  will  not 
satisfy  (3) . This  identity  can  be  used  to  give  further  improve- 
ment as  follows : 

(f)  Change  22  equal,  (p  = n-1),  so  that  (3)  is 

satisfied;  then  renormalize  row  2 with  V_- , V„„,  V.  fixed.  Fill 

21’  22  2n 

in  symmetric  terms,  in  columns  2 and  n-1  and  row  n-1. 

(g)  Renormalize  successively  all  rows  as  for  row  2;  i.e.,  leave 
fixed  the  diagonal  term  and  terms  calculated  from  symmetry  rela- 
tions with  previous  rows.  The  entire  matrix  V will  be  complete 
when  row  n/2  is  renormalized,  for  n even,  or  row  (n-l)/2,  for 

n odd.  In  the  latter  case,  (n  odd),  the  middle  row  will  not 
satisfy  (1).  The  procedure  could  be  iterated  to  improve  this, 
but  our  experience  suggests  that  this  is  not  necessary. 

3.  ACCURACY  OF  THE  METHOD 

When  the  David-Johnson  formulae  are  used  alone,  by  far  the 
greatest  error,  for  those  values  of  n (_<  20)  for  which  compari- 
sons can  be  made  over  the  entire  matrix  V,  occurs  at  For 

this  particular  entry  we  can,  of  course,  extend  comparisons  to 
n = 120;  the  error  is  about  0.00440  at  n = 20  (about  1.6%)  and 
diminishes  very  slowly  to  0.00395  at  n = 120  (about  2.2%).  In 
our  computations  we  used  the  algorithm  of  Cunningham  (1969)  to 
give  the  inverse  of  the  normal  distribution;  this  will  give  com- 
putational errors  much  smaller  than  those  in  the  approximation 
itself.  The  very  slow  decrease  lends  support  to  misgivings 


if- 


expressed  by  David  and  Johnson  on  the  convergence  properties,  for 
extreme  values,  of  their  series.  Comparisons  of  other  terras  are 
in  the  Table;  we  have  selected  those  terms  where  either  the 

TABLE 


Comparison  of  True  Values  With  Two  Approximations.  The  Asterisk 
Means  Maximum  Error  in  the  V Matrix  for  That  Approximation. 


Approximat ion 

. 1 

-J 1 

r— ^D-S 

1 

N 

Element 

True 

Value 

j Error [ 

Value 

1 Error  j 

10 

^23 

.146623 

.146423 

.000200 

.146588 

.000035* 

10 

^33 

.175003 

.174760 

.000237* 

.174998 

.000005 

15 

^22 

.179122 

.179271 

.000149* 

.179090 

.000031* 

18 

\3 

.094617 

.094546 

.000072 

.094653 

.000036* 

18 

V22 

.166293 

.166504 

.000211* 

.166279 

.000014 

20 

^22 

.159573 

.159809 

.000236* 

.159519 

.000046* 

David-Johnson 

formulae 

used  alone 

give’ largest 

error  (omitting 

hi' 

or  where 

the  new  technique  gives  largest 

error;  for 

n = 15 

and  n = 20  these  both  occur  at  ''^22'  method  reduces  the 

maximum  error  to  about  one-fifth  its  previous  value.  From  a per- 
centage point  of  view,  the  new  approximation  gives  largest  per- 
centage error  at  where  the  true  covariance  is  smallest;  this 

maximum  percentage  error  is  of  the  order  of  0.15%;  the  maximum 
absolute  error  is  generally  less  than  0.05%.  A comparison  of  the 
relative  sizes  of  the  errors  in  the  Table,  and  those  quoted  above 
for  shows  how  important  it  is  to  have  the  exact  values  for 
to  make  a good  start  in  approximating  V.  We  have  suggested 
the  upper  limit  n = 120  because  exact  values  are  known  to  this 
point.  However,  V....  approaches  zero  like  l/(ln  n)  ; in  fact, 
asymptotically  In  n has  limit  ir  /12  = 0.82  (Cramer,  1946, 

p.  376),  though  = 0.85/(ln  n)  gives  a more  accurate  approxi- 

mation in  the  region  110  n 120  (error  less  than  0.0001). 

Thus,  the  use  of  the  algorithm  could  be  extended.  In  order  to 
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use  the  David-Johnson  formulae,  a computer  would  be  needed;  the 
steps  given  above  can  be  very  easily  programmed  and  it  seems 
worthwhile  to  get  the  extra  accuracy,  particularly  if,  as  in  some 
applications,  the  inverse  of  V is  required.  A Fortran  program 
for  the  entire  procedure  is  available  from  the  authors - 
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